System and method for regular expression generation for improved data transfer

ABSTRACT

A system and method for generating regular expressions to identify vendors to enable improved financial data transfer from a first computer system to a second computer system is provided.

CROSS-REFERENCE TO RELATED APPLICATIONS

The present application is a continuation of commonly assigned copendingU.S. patent application Ser. No. 16/506,012, which was filed on Jul. 9,2019, by Edward Solovey et al. for SYSTEM AND METHOD FOR REGULAREXPRESSION GENERATION FOR IMPROVED DATA TRANSFER, which is herebyincorporated by reference.

BACKGROUND

There exists a plurality of differing financial reporting systems thatmay be utilized by an enterprise for managerial or other purposes.Examples of such systems include the Quickbooks® suite of software. Suchsoftware typically enables a user to enter financial transactions and toclassify each transaction as affecting one or more categories identifiedin a predefined chart of accounts. Earlier systems typically requiredmanual entry of all of the financial data; however, more modern systemsmay retrieve raw financial data from one or more financial serviceproviders, such as banks, credit card companies, etc.

A noted disadvantage of current systems is that the incoming rawfinancial data received from various financial service providers may notcontain sufficient information to identify a particular transaction,particularly the vendor associated with a transaction, in an automatedmanner. Typically, a financial services company, such as a credit cardservicer, may provide electronic information relating to thetransaction. However, the information provided includes a text fieldthat is not arranged in any standard format. This unstructured formatreduces the ability to perform automated data transfer from a financialservices company into an electronic financial reporting system.Effectively, the data may be transferred from the financial servicescompany to the financial reporting system, but then requires humanintervention to properly classify or label the financial data. Thissignificantly reduces throughput and causes the financial reportingsystem to be idle while waiting for humans to classify the financialdata.

More generally, when transferring data from a first computer system to asecond computer system, if portions (or all of) the data associated witha particular transaction or entry do not have a well-defined format orstructure, the computer systems may not be able to effectively transferthe data from a first format to the second format. While this iscommonly seen in relation to financial data, this noted disadvantagearises in other data communication/transfer environments. Thus, there isa noted disadvantage of computer systems that are trying to communicateusing non-structured data formats. The present invention enablesautomated association of a regular expression identifying transaction orother data entries with a source identifier, e.g., a vendor.

SUMMARY

The noted disadvantages of the prior art relating to the communicationbetween a first computer system and a second computer system using datathat is not in a highly structured format are overcome by the novelsystem and method for improved regular expression generation describedherein. The second computer system generates a mapping data structure(vendor map) utilized to associate particular regular expressions with aparticular vendor. Historical data, that illustratively includes suchmappings, is input into the second computer system, which then clustersthe data. Each cluster is then converted into regular expressions.

The converted regular expressions are then analyzed to determine if theyclash with previously generated regular expressions, i.e., a singletransaction data set would match two or more regular expressions. Ifthere are clashes, the clustering threshold is updated and new clustersare created. If there are no clashes, the tokens within the regularexpressions are evaluated for uniqueness to determine whether they aresufficiently unique.

In operation, new input data, such as a new financial transaction, isreceived at the second computer from the first computer. The secondcomputer applies the previously generated regular expressions todetermine if the new input data matches one of the regular expressions.If it does match, the system associates the new input data with theentity, such as a vendor, that is associated with the matching regularexpression. The input data may then be appropriately flagged orotherwise categorized based on the entity.

By automatically associating the input transaction with the vendoraccording to the one or more embodiments described herein, humanintervention is not required to properly classify or label the inputtransaction. Advantageously throughput at the financial reporting systemis increased. Therefore, the one or more embodiments described hereinprovide an improvement in the existing technological field associatedwith financial reporting systems since an input transaction received atthe financial reporting system may be automatically and systematicallyassociated with a vendor.

In addition, the one or more embodiments described herein have apractical application since financial reporting system may not have torely on human intervention and the input transaction may beautomatically and systematically associated with a vendor.

BRIEF DESCRIPTION OF THE DRAWINGS

The above and further advantages of illustrative embodiments of thepresent invention may be better understood by referring to the followingdescription in conjunction with the accompanying drawings in which likereference numerals indicate identical or functionally similar elements:

FIG. 1 is a schematic diagram of an exemplary network environment inaccordance with an illustrative embodiment of the present invention;

FIG. 2 is a schematic block diagram of an exemplary transaction serverin accordance with an illustrative embodiment of the present invention;

FIG. 3 is a block diagram of exemplary transaction management softwarein accordance with an illustrative embodiment of the present invention;

FIG. 4 is a diagram of an exemplary vendor map data structure inaccordance with an illustrative embodiment of the present invention;

FIG. 5 is a flowchart detailing the steps of a procedure for generatingregular expressions for classification of transaction information inaccordance with an illustrative embodiment of the present invention;

FIG. 6 is a flowchart detailing the steps of a procedure for convertingclusters of transactions into regular expressions in accordance with anillustrative embodiment of the present invention;

FIG. 7 is a flowchart detailing the steps of a procedure for vendoridentification in accordance with an illustrative embodiment the presentinvention; and

FIG. 8 is a flowchart detailing the steps of a procedure for assigning acategory to a particular transaction in accordance with an illustrativeembodiment of the present invention.

DETAILED DESCRIPTION OF AN ILLUSTRATIVE EMBODIMENT

FIG. 1 is a schematic diagram of an exemplary network environment 100 inwhich the principles of the present invention may be implemented inaccordance with an illustrative embodiment of the present invention. Theenvironment 100 is exemplary centered on a network 105. In one exemplaryembodiment, the network 105 comprises the well-known Internet. However,it should be noted that, in alternative embodiments, network 105 maycomprise one or more networks that may or may not be interconnected. Assuch, the description of a single network 105 should be taken asexemplary only. It is expressly contemplated that the network 105 maycomprise a series of interconnecting networks to enable data to flowamong any of the elements that are operatively interconnected. Forexample, a local area network (LAN) may be directly connected to theuser computer 125. That LAN may then be interconnected with a wide areanetwork (WAN), such as the Internet, to be operatively interconnectedwith other elements of the network environment 100. Further, it shouldbe noted that network 105 may be wired and/or wireless in variousportions.

A financial transaction server 200, described in further detail below inrelation to FIG. 2, is operatively interconnected with the network 105.The financial transaction server 200 executes software to provide afinancial reporting system to one or more enterprises. In operation, thefinancial reporting system gathers various financial data from the othersystems connected to network 105 and provides financial reporting andanalysis to users of the system. The financial transaction server 200may support a plurality of differing enterprises at once. That is, thetransaction management server 200 may support users at enterprise A (asoftware company) and at enterprise B (a law firm). As will be detailedfurther below, the aggregation of data across a plurality of clients ofthe financial transaction server 200 may enable improved processing ofthe financial transaction server.

Operatively interconnected to network 105 are a plurality of bankservers 110. It should be noted that while the term bank is utilized inrelation to bank servers 110, other financial services companies mayalso be interconnected in accordance with the principles of the presentinvention. Therefore, it is expressly contemplated that the term bankshould be taken to be encompass more generally any other financialservice providers such as, inter alia, brokerage firms, credit unions,alternative financial providers, such as PayPal, etc. In accordance withan illustrative embodiment of the present invention, the bank servers110 provide, via the network 105, financial transaction information tofinancial transaction server 200, as described further below.

Also operatively interconnected with the network 105 are one or morecredit card servers 115. Similar to bank servers 110, credit cardservers 115 provide credit card transaction information to the financialtransaction server 200 in accordance with an illustrative embodiment ofthe present invention. Such information may identify a date that atransaction was processed, a payee, a dollar amount, a type oftransaction or category of transaction, an unformatted text field, etc.Again, similar to bank servers 110, credit card servers 115 may providesuch information in either a batch process or in substantially realtime. It should be further noted that while the term credit card isbeing referred to as used herein, it is expressly contemplated thatcharge cards, pre-paid cards, and the like are expressly contemplated inalternative embodiments. Therefore, the use of the term credit cardshould be taken as exemplary only.

One or more vendor servers 120 may also be interconnected with thenetwork 105. The vendor servers 120 may be associated with particularvendors of the enterprise. As used herein, a vendor may comprise anorganization or entity that provides goods and/or services to theenterprise and which receives payment from the enterprise. For example,a vendor may be a ride sharing company that is utilized by employees ofthe enterprise when traveling for business. Similarly, a vendor maycomprise a company that provides goods, such as an office supplycompany. In accordance with an illustrative embodiment of the presentinvention, vendor servers 120 may provide detailed information aboutparticular transactions to the financial transaction server 200. Suchinformation may include, for example, a transaction identifier, alisting of goods or services provided, a total cost, and, in alternativeembodiments, additional information relating to the transaction.

One or more user computers 125 are also operatively interconnected withthe network 105. Illustratively, the user computer 125 may be a computerexecuting web-based access software (not shown) to enable a user tocommunicate with the financial transaction server 200, described furtherbelow in reference to FIG. 2. More generally, a user computer may beimplemented as a smartphone, tablet, laptop, desktop or other computingdevice that is capable of interfacing with financial transaction server200. In operation, typically there will be a plurality of user computersinteracting with the server. For example, a chief financial officer maybe utilizing his or her computer to access the server 200, while adepartment head may be using his to access transaction informationrelating to a particular department. Therefore, while a single usercomputer 125 is shown, it is expressly contemplated that a plurality ofuser computers 125 may be utilized in accordance with an illustrativeembodiment of the present invention. Further, the form and type of theuser computer may vary. Therefore, the term user computer should beinterpreted broadly to encompass any such device enabling interactionwith the server.

FIG. 2 is a schematic block diagram of a financial transaction server200 that illustratively comprises one or more processors 210, a memory215, a network adapter 220, a display driver 235, and a storage adapter225 interconnected by a system bus 205. It should be noted that certaincomponents are shown and described herein, but that the principles ofthe present invention may be utilized in computing environments havingdiffering configurations as long as similar functionality is obtained.

The memory illustratively comprises storage locations that areaddressable by the processor 210 and adapters for storing softwareprogram code and data structures associated with the present invention.The processor and adapters may, in turn, comprise processing elementsand/or logic circuitry configured to execute the software code andmanipulate the data structures.

One or more processors 210 provide the processing power to the server.In accordance with various alternative embodiments of the presentinvention, there may be one or more processors, each having one or morecores. Further, it should be expressly noted that while the server 200is shown as a single entity, in alternative embodiments thefunctionality may be divided among a plurality of computing devices.Therefore, the description of a singular server 200 should be taken asexemplary only. The memory 215 may be utilized to store program code,software and/or data structures for operation by the processor.

The memory 215 stores an operating system (not shown) and executes thetransaction management software 300, described below in reference toFIG. 3. It should be noted that in alternative embodiments variousoperating systems may be utilized and, in embodiments wherein the serveris executing over a plurality of computing devices, each computingdevice is not required to execute the same operating system.

The network adapter comprises a plurality of ports adapted to couple theserver to one or more entities over a network 105. The network adapterthus may comprise mechanical, electrical and signaling circuitry neededto connect the system to the network 105. Illustratively, a plurality ofnetwork adapters may be utilized depending on the required bandwidthbetween the server 200 and network 105. Further, it is expresslycontemplated that various network adapters may vary in connectivitytype. For example, a first network adapter may be a conventionalEthernet adapter connecting the server 200 to the Internet. However, itis expressly contemplated that a second network adapter 220 may be awireless network adapter enabling user computers that are in closevicinity to the server to access it without transmitting over a wirednetwork. As will be appreciated by those skilled in the art, a varietyof network adapters and configurations are possible to meet the needs ofan enterprise's network security, bandwidth, and geographicdistribution. Therefore, the description of a network adapter 220 shouldbe taken as exemplary and be interpreted broadly.

The storage adapter cooperates with the server to access informationrequested on storage devices 230. Information may be stored in a type ofattached array of writable storage device media such as, for examplevideo tape, optical, DVD, magnetic tape, bubble memory, electronicrandom access memory, disk drives, flash drives, or any other similarmedia adapted to store information. As illustratively described herein,the information is stored on storage devices, such as disks 230. Thestorage adapter comprises a plurality of ports having input/outputinterface circuitry that couples to the disks 230. It should be notedthat in alternative embodiments, a plurality of different types ofstorage devices 230 may be utilized for a single server. For example,some data may be stored on a solid-state storage device, whereas otherdata is stored on magnetic media. Therefore, the description of discs asstorage devices should be taken as exemplary only.

Display driver 235 provides output information to a display 240. Inaccordance with an illustrative embodiment of the present invention, adisplay 240, such as a conventional computer monitor, may be providedfor displaying information from software 300, etc. However, inalternative embodiments of the present invention such information istransmitted over network 105 to a user's computer 125 to be displayed atthe user computer.

FIG. 3 is a diagram illustrating exemplary modules of the transactionmanagement software 300 in accordance with an illustrative embodiment ofthe present invention. It should be noted that while the variouscomponents may be referred to as modules, it is expressly contemplatedthat differing software arrangements may be utilized in accordance withalternative embodiments of the present invention. Therefore, the use ofthe term module should be taken as exemplary only.

Illustratively, the transaction management software implements afinancial reporting and modeling system that aggregates data fromvarious financial service providers and provides modeling capabilities.In an illustrative embodiment, the software provides conventionalaccounting and reporting functionality. However, in alternativeembodiments, the software may rely on other software, such asQuickbooks, to provide the basic accounting functionality.

The system 300 illustratively comprises a classification engine 305, avendor map module 310, an e-mail module 315, a rules engine 320, aprojection engine 325, a database module 330, a transaction engine 335,and a user interface module 340. It should be noted that in accordancewith alternative embodiments of the present invention and various otherand/or differing modules may be implemented. Further, the functionalitydescribed herein in relation to each of the various modules may bemerged and/or rearranged in alternative embodiments. As such, thedescription of the functionality being performed by specific moduleshould be taken as exemplary only.

The classification engine 305 operates to retrieve classify eachtransaction by assigning it to one or more categories for use by thesystem. For example, a credit card transaction is analyzed andidentified as being associated with a particular vendor. Theclassification engine 305 may utilize the vendor map module 310 toidentify the category or categories into which the transaction should beassociated.

The vendor map module 310 contains a vendor map 400, described below inrelation to FIG. 4, that provides associations between particularvendors and categories in accordance with illustrative embodiment of thepresent invention.

The e-mail module 315 illustratively analyzes a user's e-mail toidentify e-mails from vendors. E-mails from vendors may be analyzed toidentify details associated with a particular transaction. For example,a user may purchase an item from an online vendor. The classificationengine may receive the financial transaction from, for example, a creditcard server. However, without further information, the granularity ofdata is limited to the fact that a transaction for a particular amountwith a particular vendor has occurred. The e-mail module may, in theillustrative embodiments, skim a user's e-mail to identify an e-mailedreceipt that may be matched with a particular financial transaction toenable the specific goods or services that were purchased to be inputinto the system.

A rules engine 320 manages the overall system to ensure that transactioninformation is updated in accordance with a user defined preference.

The projection engine 325 illustratively manages financial projectionsin accordance with illustrative embodiment of the present invention. Inone exemplary embodiment, a user may define a projection based on agiven set of assumptions.

A database module 330 tracks and manages the various data structuresutilized by the software 300. It should be noted that while it isidentified as a database module, in accordance with exemplaryembodiments of the present invention, there are other types of datastructures may be utilized. Therefore, the term database should be takenas exemplary only.

The transaction engine 335 manages the input of various financialtransactions into the software 300.

The user interface module 340 illustratively provides a web based userinterface to the software. In accordance with alternative embodiments ofthe present invention, the UI may provide a localized display.Therefore, the description of the user interface being web-based shouldbe taken as exemplary only.

FIG. 4 is a diagram of an exemplary vendor map data structure 400 inaccordance with an illustrative embodiment the present invention. Itshould be noted that while the vendor map data structure 400 isdescribed herein with a particular format, in accordance withalternative embodiments of the present invention, the vendor map datastructure 400 may be implemented using other configurations. As such,the description contained herein of a particular data structure formatshould be taken as exemplary only. It should be noted that inalternative embodiments, a plurality of data structures may be utilizedto store the various information described herein.

The vendor map data structure 400 includes a plurality of entries 405A,B., C., each of which is associated with a particular vendor. Each entry405 contains a vendor identification field 410. The vendoridentification field 410 may contain the name of the vendor, contactinformation, a unique vendor ID, or other information utilized by thetransaction system to identify a particular vendor.

Further associated with each entry 405 is a client field 415.Illustratively, the client field 415 contains identifiers for eachclient that is associated with the transaction system and that utilizesa particular vendor. The local/global flag 420 is utilized by the systemto determine whether or not a particular entry 405 and the associatedregular expressions are local or global, as described further below. Aset of regular expressions 425 that are associated with a particularvendor are stored. The regular expressions are utilized by thetransaction management system as described further below to convertincoming data from a financial service provider for identification andclassification purposes. The use of the regular expressions inaccordance with illustrative embodiments of the present invention enableimproved processing by the automated financial transaction systems byreducing the need for manual intervention and enabling automated datatransfer between a financial service provider and the automatedaccounting system.

Entry 405 may include one or more alias fields 430. Alias fields 430 maybe utilized to store particular aliases associated with a vendor. Thesemay the variations of a vendor's name, such as “Amazon.com” versus“Amazon, Inc”. The alias fields 430 enable the transaction managementsystem to correctly identify a particular vendor even if transactioninformation received from a financial services company utilizes avariation of the vendor's name.

A classification field 435 is utilized to store information relating tohow to classify transactions from a particular vendor in accordance withillustrative embodiments of the present invention. Illustratively, thisclassification information may include amount classification information440 and/or time classification information 445. As can be seen inexemplary data structure 400, entry 405A includes both amount 440 andtime 445 information, while entry 405B only includes amount information440 B.

FIG. 5 is a flowchart detailing the steps of a procedure 500 forgenerating regular expressions in accordance with illustrativeembodiment of the present invention. The procedure 500 begins in step505 and continues to step 510 where historical data is input. This stepis utilized to obtain the raw data necessary to generate the regularexpressions for improved data transfer in accordance with anillustrative embodiment of the present invention. Illustratively, thehistorical data has been pre-categorized by hand while a particular userwas utilizing a previous accounting/financial software package. As eachtransaction is identified with a particular vendor, the presentinvention may utilize this a priori knowledge to generate the regularexpressions to enable improved operation of a financial system.

As noted above, each transaction that is imported into the transactionmanagement system contains a not highly structured text descriptionfield. Illustratively, there is no set standard format for the contentsof this free-form text field. That is, each vendor may include anydesired information within that field. Further, vendors may vary theinformation contained within the free-form text field. The fieldsassociated with the transaction may be an amount of, currencyidentifier, etc. These fields associate with input transaction are morewell-defined and do not need to be analyzed like the text descriptionfield. Illustratively, the models that are generated by transactionmodeling system are generated on a per customer level.

The procedure then moves to step 515 where clusters of the imported dataare created. An exemplary entity that is being used to obtain financialtransaction data is a local bank with which a particular customer has abanking relationship. Six exemplary transactions are illustrated below:

-   -   INTERNAL TRANSFER INT TRANSFER FROM 1234    -   INTERNAL TRANSFER INT TRANSFER FROM 5678    -   AUTOMATIC PAYMENT—THANK YOU    -   AUTOMATIC PAYMENT—THANK YOU    -   ZERO BAL TRF DEBIT INT TRANSFER TO 9001    -   ZERO BAL TRF DEBIT INT TRANSFER TO 9002

To a human, it is readily apparent that these transactions may begrouped into three different sets. However, automating this clusteringmay be difficult. In accordance with an illustrative embodiment, thesystem utilizes the Levenshtein Distance to generate clusters. In atypical scenario where more clusters are desired, the six illustrativetransactions may be arranged into three clusters as such:

-   CLUSTER 1:    -   INTERNAL TRANSFER INT TRANSFER FROM 1234    -   INTERNAL TRANSFER INT TRANSFER FROM 5678-   CLUSTER 2:    -   AUTOMATIC PAYMENT—THANK YOU    -   AUTOMATIC PAYMENT—THANK YOU-   CLUSTER 3:    -   ZERO BAL TRF DEBIT INT TRANSFER TO 9001    -   ZERO BAL TRF DEBIT INT TRANSFER TO 9002

However, should be clustering threshold be set so that fewer distinctclusters are generated, the same six transactions may be clustered assuch:

-   CLUSTER 1:    -   INTERNAL TRANSFER INT TRANSFER FROM 1234    -   INTERNAL TRANSFER INT TRANSFER FROM 5678    -   ZERO BAL TRF DEBIT INT TRANSFER TO 9001    -   ZERO BAL TRF DEBIT INT TRANSFER TO 9002-   CLUSTER 2:    -   AUTOMATIC PAYMENT—THANK YOU    -   AUTOMATIC PAYMENT—THANK YOU

As will be appreciated by those skilled in the art, the LevenshteinDistance includes a similarity threshold that may vary between 0-100. Athreshold value of 0 causes a single cluster to be generated, while avalue of 100 requires strings to be identical in order to be placed intoa same cluster. In accordance with an illustrative embodiment of thepresent invention, the similarity threshold is initially set at a valuebetween 75-85, and later adjusted by the software as needed and asdescribed herein. In accordance with alternative embodiments, otherclustering techniques may be utilized that offer user defined variablesto adjust the number of clusters.

Each cluster is then converted into one or more regular expressions instep 600 as is further described below in relation to FIG. 6. It shouldbe noted that in accordance with alternative embodiments of the presentinvention, techniques other than that disclosed in relation to FIG. 6may be utilized to convert the clusters to regular expressions.Therefore the disclosure in relation to procedure 600 should be taken asexemplary only.

After the regular expressions have been generated, a determination ismade, in step 520, whether there are any clashes among the generatedregular expressions and any pre-existing regular expressions. Anexemplary clashing regular expression is one in which a singletransaction would match two or more regular expressions. Thus, thesystem would be unable to determine which of the plurality of regularexpressions should be utilized. More generally, in accordance withillustrative embodiments of the present invention, any transactionshould only match a single regular expression. Should a transactionmatch a plurality of regular expressions, the transaction managementsystem may not be able to identify a unique vendor to associate with thetransaction. That would complicate the operation of the transactionmanagement server and could create corrupted data.

If there are clashed clashes, the procedure branches to step 525 wherethe clustering threshold is updated. As we appreciated by those skilledin the art, the clustering threshold level may be raised or lowered tofine tune the number of clusters generated. When using the LevenshteinDistance, the clustering value may be adjusted. In accordance with anillustrative embodiment of the present invention to similarity thresholdmay first be set to a value of 75-85 and may be dynamically adjusted bythe system thereafter.

The procedure then branches back to step 515. If, in step 520, there areno clashes with the regular expressions, the procedure moves to step 530where token uniqueness is evaluated. Tokens are described in more detailbelow in relation to FIG. 6. Token uniqueness may be evaluated bothlocally and globally. The evaluation of token uniqueness is an aid todetermine the risk of false positives, i.e., the likelihood that aregular expression will match an input description that is notassociated with the vendor with which the regular expression is set.This test may be performed by analyzing the number of static tokens inthe regular expression as well as the uniqueness of each token.

A determination is made in step 535 whether the uniqueness of the tokensis acceptable. If the uniqueness is not acceptable, the procedurebranches back to step 515. If the uniqueness is acceptable, theprocedure continues to step 540 were the model is updated.Illustratively, the updating of the model may include, for example,updating the vendor map data structure to include the newly developedregular expressions, etc. More generally, updating the model associatesthe generated regular expression with a source identifier. In anillustrative embodiment, the source identifier is a vendor and theassociation is stored in the vendor map data structure. However, inalternative embodiments, the source identifier may identify an entityother than a vendor. As such, the description of associating a generatedregular expression with a particular vendor should be taken as exemplaryonly. The procedure then completes in step 545.

FIG. 6 is a flowchart detailing the steps of the procedure forgenerating regular expressions in accordance with illustrativeembodiment present invention. The procedure 600 begins in step 605continues to step 610 where each input string is tokenized. We will usethe four following exemplary transactions as the procedure 600 isdescribed:

-   -   BEST BUY 00010454 SAN FRANCISCO Calif.    -   BEST BUY 00010909 BURBANK Calif.    -   BEST BUY 00014008 TEMECULA Calif.    -   BEST BUY 00023969 SAN JOSE Calif.

In operation, these would have been tagged (while part of the historicaldata that is input into the system) as being associated with the vendor“Best Buy”.

Tokenization occurs by splitting each string into parts separated bywhitespace characters. In accordance with alternative embodiments of thepresent invention, the strings may be split by use of other characters,e.g., underscores, etc. It should be noted that while the use ofwhitespace is described herein, the principles of the present inventionmay be utilized with other dividers, including embodiments where aplurality of dividers are utilized. As such, the description of the useof whitespace should be taken as exemplary only.

A position map is then generated for each input string in step 615. Eachof the tokens is assigned a position based on its location within thestring. For example, the first exemplary string:

-   -   BEST BUY 00010454 SAN FRANCISCO Calif.

would have a position map of:

-   -   BEST→1    -   BUY→2    -   00010454→3    -   SAN→4    -   FRANCISCO→5    -   CA→6

Static or common tokens are then identified in step 620. This may beaccomplished by, e.g., for each token checking to see if it appears inevery other string in the cluster. If the token does appear in everyother string, then it is deemed to be a match. Using the sample dataset, the token BEST would match every other string at position 1, whileBUY would match every string at position 2. 0010454, SAN, and FRANCISCOwould not match all other strings, so they would be discarded. Finally,the token CA would match all other strings. Some of these matches are atposition 5 and others at position 6. It should be noted that the exactpositions are not relevant, but instead the fact that the token matcheseach other string, even if in differing locations, is determinative of atoken being retained or discarded.

In this example, the tokens BEST, BUY, and CA are found in each of thestrings of data in the cluster. Therefore, these three tokens are deemedto be static/common for this cluster. One or more regular expressionsare then generated from the identified static/common tokens in step 625.The regular expression is illustratively generated by concatenating allof the tokens that survived the matching with wildcard capture areasbetween them. In the example described herein, the regular expressionwould be:

-   -   (.*) BEST BUY (.*) CA (.*)        where (.*) is a wildcard identifier.

The procedure then completes in step 630. It should be noted that theprocedure 600 described herein is an illustrative embodiment of atechnique for generating regular expressions. It is expresslycontemplated that alternative techniques may be utilized.

FIG. 7 is a flowchart detailing the steps of a procedure for processingan input transaction in accordance with illustrative embodiment of thepresent invention. The procedure begins in step 705 and continues tostep 710 where new transaction data is entered. This new data maycomprise a new batch of financial transactions received from a creditcard company, etc. As noted above, various data providers may providedata in a substantially real time manner or may perform batch dumps ofdata. The procedure continues to step 715 where previously generatedregular expressions are applied. That is, previously generated regularexpressions that are stored in the vendor map data structure are appliedto the transaction information associated with the new transaction data.

A determination is made whether any of the previously generated regularexpressions match on the new transaction data in step 720. If there is amatch, the procedure branches to step 725 to use the regular expressionthat matched to identify the vendor associated with the new transaction.The model is then updated in step 730 before completing in step 735.Updating the model illustratively comprises of re-executing procedure500 to determine if any refinements should be made to the regularexpressions. In an illustrative embodiment, after every transaction isprocessed, the model is updated. However, in alternative embodiments,the model may be updated on a less frequent basis. As such, thedescription of updating the model after every transaction should betaken as exemplary only.

However, if in step 720 it is determined that no match occurs, theprocedure then branches to step 740 to wait for a human to label thetransaction. Once a human labels the transaction, the appropriate vendorlabel is applied in step 745 and the procedure continues to step 730.

FIG. 8 is a flowchart detailing the steps of the procedure forprocessing a transaction in accordance with illustrative embodiment ofthe present invention. The procedure 800 begins in step 805 andcontinues to step 810 where transaction data is entered. Similar toprocedure 700, the new data may represent batch inputted data orsubstantially real time data received. The normal data processingoccurs, as described above in procedure 700, and the appropriate regularexpression is applied to determine which vendor is associated with theparticular transaction. Once the vendor is identified, vendor tocategory mapping information may be applied, in step 820, to associate aparticular financial category with the received transaction. Theprocedure 800 then completes in step 825.

By automatically associating the input transaction with the vendoraccording to the one or more embodiments described herein, humanintervention is not required to properly classify or label the inputtransaction. Advantageously throughput at the financial reporting systemis increased. Therefore, the one or more embodiments described hereinprovide an improvement in the existing technological field associatedwith financial reporting systems since an input transaction received atthe financial reporting system may be automatically and systematicallyassociated with a vendor.

In addition, the one or more embodiments described herein have apractical application since financial reporting system may not have torely on human intervention and the input transaction may beautomatically and systematically associated with a vendor.

It should be noted that the description contained herein is exemplaryand that it is expressly contemplated that alternative embodiments arepossible. Therefore, examples, labels, titles, and structures describedherein should be taken as exemplary. As will be appreciated by thoseskilled in the art, differing software constructs may be utilized toachieve the same functionality. Therefore, the description containedherein should be viewed as exemplary.

What is claimed is:
 1. A method for vendor identification for receivedtransaction data, the method comprising the steps of: receiving, at acomputer system comprising a processor and a memory, new transactiondata associated with an unknown vendor; comparing, by the computersystem, the new transaction data with one or more regular expressionsfor each of a plurality of known vendors, wherein the one or moreregular expression for each of the plurality of known vendors is storedin a vendor data structure, and wherein each of the one or more regularexpressions is unique in the vendor data structure and determined to beunique utilizing a clustering technique that generates a differentnumber of clusters for historical data until the regular expression isdetermined to be unique in the vendor data structure; determining, bythe computer system, whether a single regular expression, of the one ormore regular expressions for each of the plurality of known vendors,matches the new transaction data; in response to a match between thesingle regular expression and the new transaction data, determining thata corresponding vendor, corresponding to the single regular expression,is the unknown vendor; and in response to no match between the one ormore regular expressions, for each of the plurality of known vendors,and the new transaction data: determining that input is required toidentify the unknown vendor, receiving the input identifying the unknownvendor as a particular vendor, and updating the vendor data structurewith a new entry that associates the f new transaction data with theparticular vendor.
 2. The method of claim 1 wherein processingthroughput, at the computer system, increases based on determining thecorresponding vendor or identifying the particular vendor.
 3. The methodof claim 1 wherein the clustering technique is based on a Levenshteindistance.
 4. The method of claim 1 wherein the new transaction datacomprises credit card transactions.
 5. A system comprising: a computerhaving a processor, the processor executing a transaction managementsoftware, the transaction management software configured to: receive,over a network, new transaction data associated with an unknown vendor;compare the new transaction data with one or more regular expressionsfor each of a plurality of known vendors, wherein the one or moreregular expression for each of the plurality of known vendors is storedin a vendor data structure; determine whether a single regularexpression, of the one or more regular expressions for each of theplurality of known vendors, matches the new transaction data; inresponse to a match between a single regular expression and the newtransaction data, determine that a corresponding vendor, correspondingto the single regular expression, is the unknown vendor; and in responseto no match between the one or more regular expressions, for each of theplurality of known vendors, and the new transaction data: determine thatinput is required to identify the unknown vendor, receive the inputidentifying the unknown vendor as a particular vendor, and update thevendor data structure with a new entry that associates the newtransaction data with the particular vendor.
 6. The system of claim 5wherein processing throughput, at the system, increases based ondetermining the corresponding vendor or identifying the particularvendor to increases based on the identifying the identified vendor ordetermining the particular vendor.
 7. The system of claim 5 wherein eachof the one or more regular expressions, generated for each of theplurality of known vendors, is unique within the vendor data structureand generated using a clustering technique.
 8. The system of claim 7wherein the clustering technique is based on a Levenshtein distance. 9.The method of claim 5 wherein the new transaction data comprises creditcard transactions.